COBRA: A Framework for Continuous Profiling and Binary Re-Adaptation

نویسندگان

  • Jinpyo Kim
  • Wei-chung Hsu
چکیده

Dynamic optimizers have shown to improve performance and power efficiency of single-threaded applications. Multithreaded applications running on CMP, SMP and cc-NUMA systems also exhibit opportunities for dynamic binary optimization. Existing dynamic optimizers lack efficient monitoring schemes for multiple threads to support appropriate thread specific or system-wide optimization for a collective behavior of multiple threads since they are designed primarily for single-threaded programs. Monitoring and collecting profiles from multiple threads expose optimization opportunities not only for single core, but also for multi-core systems that include interconnection networks and the cache coherent protocol. Detecting global phases of multithreaded programs and determining appropriate optimizations by considering the interaction between threads such as coherent misses are some features of the dynamic binary optimizer presented in this thesis when compared to the prior dynamic optimizers for single threaded programs. This thesis presents COBRA (Continuous Binary Re-Adaptation), a dynamic binary optimization framework, for single-threaded and multithreaded applications. It includes components for collective monitoring and dynamic profiling, profile and trace management, code optimization and code deployment. The monitoring component collects the hot branches and performance information from multiple working threads with the support of OS and the hardware performance monitors. It sends data to the dynamic profiler. The dynamic profiler accumulates performance bottleneck profiles such as cache miss information along with hot branch traces. Optimizer generates new optimized binary traces and stored them in the code cache. Profiler and optimizer closely interact with each other in order to optimize for more effective code layout and fewer data cache miss stalls. The continuous profiling component only monitors the performance behavior of optimized binary traces and generates the feedback information to determine the efficiency of optimizations for guiding continuous re-optimization. It is currently implemented on Itanium 2 based CMP, SMP and cc-NUMA systems. This thesis proposes a new phase detection scheme and hardware support, especially for dynamic optimizations, that effectively identifies and accurately predicts program phases by exploiting program control flow information. This scheme could not only be applied on single-threaded programs, but also more efficiently applied on multithreaded programs. Our proposed phase detection scheme effectively identifies dynamic intervals that are contiguous variable-length intervals aligned with dynamic i code regions that show distinct single and parallel program phase behavior. Two efficient phase-aware runtime program monitoring schemes are implemented on our COBRA framework. The sampled Basic Block Vector (BBV)-based and sampled Hot Working Set (HWSET)-based program phase detection schemes are studied. We showed that Sampled HWSET-based program phase detection scheme has a higher phase coverage and a longer stable phase compared to sampled BBV-based program phase detection scheme. We also propose dynamic code region (DCR)-based program phase detection hardware for dynamic optimization system. We show that our proposed hardware exhibits the desired characteristics of a phase detector for dynamic optimization. This thesis also proposes a persistent dynamic profile management scheme for continuous re-optimization. The code region based profile manager stores dynamic control flow information including hot paths and loops. It classifies them according to an entropy calculated from the frequency vectors of taken branches and load latencies. The profile characterization and classification minimize the explosion of persistent runtime profiles and the overhead of profile collection for continuous re-optimization. We implemented two dynamic compiler optimizations to reduce the impact of coherent memory accesses in OpenMP NAS parallel benchmarks. Using OpenMP NAS parallel benchmarks, we show how COBRA can adaptively choose appropriate optimizations according to observed changing runtime program behavior. The optimizations improve the performance of OpenMP NAS parallel benchmarks (BT, SP, LU, FT, MG, CG) up to 15% with an average of 4.7% on a 4-way Itanium 2 SMP server, and up to 68% with an average of 17.5% on a SGI Altix cc-NUMA system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ARTENOLIS: Automated Reproducibility and Testing Environment for Licensed Software

Motivation: Automatically testing changes to code is an essential feature of continuous integration. For open-source code, without licensed dependencies, a variety of continuous integration services exist. The COnstraint-Based Reconstruction and Analysis (COBRA) Toolbox is a suite of open-source code for computational modelling with dependencies on licensed software. A novel automated framework...

متن کامل

Gradient-based Ant Colony Optimization for Continuous Spaces

A novel version of Ant Colony Optimization (ACO) algorithms for solving continuous space problems is presented in this paper. The basic structure and concepts of the originally reported ACO are preserved and adaptation of the algorithm to the case of continuous space is implemented within the general framework. The stigmergic communication is simulated through considering certain direction vect...

متن کامل

COBrA and COBrA-CT: Ontology Engineering Tools

COBrA is a Java-based ontology editor for bio-ontologies and anatomies that differs from other editors by supporting the linking of concepts between two ontologies, and providing sophisticated analysis and verification functions. In addition to the Gene Ontology and Open Biology Ontologies formats, COBrA can import and export ontologies in the Semantic Web formats RDF, RDFS and OWL. COBrA is be...

متن کامل

Gradient-based Ant Colony Optimization for Continuous Spaces

A novel version of Ant Colony Optimization (ACO) algorithms for solving continuous space problems is presented in this paper. The basic structure and concepts of the originally reported ACO are preserved and adaptation of the algorithm to the case of continuous space is implemented within the general framework. The stigmergic communication is simulated through considering certain direction vect...

متن کامل

A MODEL FOR MIXED CONTINUOUS AND DISCRETE RESPONSES WITH POSSIBILITY OF MISSING RESPONSES

A model for missing data in mixed binary and continuous responses, which can be used on cross-sectional data, is presented. In this model response indicator for the binary response can be dependent on the continuous response. A closed form for the likelihood is found. For data with a complicated pattern of missing responses some new residuals are also proposed. The model of multiplicative heter...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008